Neuro Based Approach for Speech Recognition by Using Mel-frequency Cepstral Coefficients
نویسندگان
چکیده
NEURO BASED APPROACH FOR SPEECH RECOGNITION BY USING MEL-FREQUENCY CEPSTRAL COEFFICIENTS R.L.K. Venkateswarlu1 and R. Vasanthakumari2 1 Department of Information Technology, Sasi Institute of Technology and Engineering, Tadepalligudem, India, E-mail: [email protected]. 2 Perunthalaivar Kamarajar Arts College, Puducherry-605107, India, E-mail: [email protected]. This paper presents continuous speech recognition system based on neural network concept. Features are extracted and the data is compressed using Mel-frequency Cepstral coefficients method. These Mel-frequency Cepstral coefficients are used as inputs to train neural networks. Neural networks are useful to solve complex problems which do not require accurate solution. The backpropagation algorithm is used in multilayer perceptron. The solution found in this approach is convergent. This research work is aimed at speech recognition using multilayer perceptron neural networks. A small vocabulary of 11 words were established first, these words are upload, search, browse, import, export, send, remove, attach, help, format, install. These chosen words involved with executing some computer functions such as export a file or an image; find a file or a folder or a image; to view the data; download some properties; transfer out of a database or document in a format; to move a file; to delete a file; to add a file; some assistance; to delete existing content; to add new software. Are introduced to the computer and then subjected to feature extraction process using Mel-frequency cepstral coefficients. These features are used as input to an artificial neural network in speaker dependent mode. Half of the words are used for training the artificial neural network and the other half are used for testing the system. The system components consist of three parts, speech processing, feature extraction, training and testing by using neural networks and information retrieval. The retrieve process proved to be 81.44%–93.18% successful, which is quite acceptable, considering the variation to surroundings, state of the person, and the microphone type.
منابع مشابه
Voice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملAcoustic Emotion Recognition Using Linear and Nonlinear Cepstral Coefficients
Recognizing human emotions through vocal channel has gained increased attention recently. In this paper, we study how used features, and classifiers impact recognition accuracy of emotions present in speech. Four emotional states are considered for classification of emotions from speech in this work. For this aim, features are extracted from audio characteristics of emotional speech using Linea...
متن کاملUsing Mel-Frequency Cepstral Coefficients in Missing Data Technique
Filter bank is the most common feature being employed in the research of the marginalisation approaches for robust speech recognition due to its simplicity in detecting the unreliable data in the frequency domain. In this paper, we propose a hybrid approach based on the marginalisation and the soft decision techniques that make use of the Mel-frequency cepstral coefficients (MFCCs) instead of f...
متن کاملSpeaker Recognition System Based On MFCC and DCT
This paper examines and presents an approach to the recognition of speech signal using frequency spectral information with Mel frequency. It is a dominant feature for speech recognition. Mel-frequency cepstral coefficients (MFCCs) are the coefficients that collectively represent the shortterm power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a non linear m...
متن کاملHardware Implementation of Speech Recognition Using MFCC and Euclidean Distance
This paper suggests Digital Signal processor (DSP) based speech recognition system with improved performance in terms of recognition accuracies and computational cost. The comprehensive surrey of various approaches of feature extraction like Mel filter banks with Mel Frequency Cepstrum Coefficients (MFCC). This paper describes an approach of isolated speech recognition by Digital Signal Process...
متن کامل